Skip to content

Conversation

@baogorek
Copy link
Collaborator

@baogorek baogorek commented Sep 10, 2025

Summary

Implements congressional district (CD) level estimation capability through a geo-stacking calibration framework with hierarchical targets database.

Key Features

Targets Database Infrastructure

  • SQLite database for storing and managing calibration targets at multiple geographic levels (national, state, CD, county)
  • Hierarchical validation ensuring child targets sum to parent totals
  • ETL pipelines for:
    • IRS SOI data (state and county AGI distributions)
    • Medicaid enrollment by state
    • SNAP participation by state
    • Age distributions

Geo-Stacking Calibration

  • Sparse matrix implementation for efficient CD-level weight calibration
  • Stratified sampling approach preserving household structure
  • Support for both state-only and state+CD stacking
  • Comprehensive validation metrics and diagnostics

Quality Assurance

  • Holdout validation framework for model evaluation
  • Household-level tracing for debugging weight assignments
  • Hierarchical consistency checks across geographic levels
  • Weight diagnostics and distribution analysis

Developer Experience

  • GEO_STACKING=true environment variable for specialized pipeline
  • make data-geo target for geo-stacking workflow
  • Extensive documentation in geo_stacking_calibration/ directory

Technical Details

The geo-stacking approach creates geographically-representative microdata by:

  1. Stratifying CPS households by demographic/economic characteristics
  2. Calibrating weights to match targets at state and CD levels simultaneously
  3. Validating outputs against holdout targets and hierarchical constraints

See policyengine_us_data/datasets/cps/geo_stacking_calibration/GEO_STACKING_TECHNICAL.md for detailed methodology.

Usage

# Run geo-stacking pipeline
make data-geo

# Or directly:
GEO_STACKING=true python policyengine_us_data/datasets/cps/cps.py

Breaking Changes

None - all functionality is additive and behind feature flags.

baogorek and others added 29 commits October 10, 2025 11:05
Add CPS_2025 generation to TEST_LITE pipeline to ensure the test_cps_2025_generates test passes in CI. The test was expecting CPS_2025 to exist but it wasn't being generated in TEST_LITE mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add file existence check before opening database
- Convert db_path to absolute path to ensure SQLite can find it
- Add verification step in workflow to catch download failures early

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace absolute path with relative path computed from script location
to work in both local and GitHub Actions environments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Merged 5 separate markdown files into one concise README.md:
- AUDIT.md, GEO_STACKING_PIPELINE.md, GEO_STACKING_TECHNICAL.md,
  PROJECT_STATUS.md, VALIDATION_DESIGN_MATRIX.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…ruth

- Add get_calculated_variables(), apply_op(), state mappings to calibration_utils.py
- Add get_all_cds_from_database() to replace duplicate SQL queries
- Remove freeze_calculated_vars parameter (use aggregate tolerance instead)
- Update cache clearing to use canonical get_calculated_variables()
- Clean up test files and add test_sparse_matrix_verification.py
- Update README with state-dependent variable documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Move all files from geo_stacking_calibration/ to local_area_calibration/
- Update import paths across all modules
- Remove obsolete test_sparse_matrix_builder.py (replaced by test_sparse_matrix_verification.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@baogorek baogorek closed this Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants